[release/9.0] [NativeAOT] Introduce pointer-based CompareExchange intrinsic and use operating with syncblock bits. #106727
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of #106703 to release/9.0
/cc @VSadov
Customer Impact
When a NativeAOT app is compiled for Debug, use of monitor locks (as in
lock(obj){...}
statement), could in rare cases lock a wrong object or cause a crash in GC.This was causing intermittent build failures in CI, about once a day, when the bug made it into toolset and ILC itself was impacted.
Regression
The root cause is pre-9.0, but was not exposed as a bug due to several mitigating factors.
Making CompareExchange intrinsic self-referential in #92974 (Implement Interlocked for small types) indirectly exposed the bug.
Since this is a stress bug with relatively rare occurrence, it took a while for the issue to propagate to the toolset compiler and then present itself in a form of CI build failures due to crashing ILC
Testing
Verified locally by examining the codegen for both x64 and arm64, and by running a directed repro in a loop 300 times.
(with directed repro without a fix a crash happens after 5-10 iterations)
There is no way to add a regression test for this right now. We mostly rely on CoreCLR for GC-stress, but this scenario is one of a few cases specific to NativeAOT and not reachable in CoreCLR stress tests.
We will be discussing how to improve our GC-stress situation with NativeAOT.
Risk
Low. The actual change is fairly small. We rely on existing JIT code that expands CompareExchange intrinsic for
ref int
, just introduced an internal entry point that takesint*
as location - so that unsafe GC byref to an object syncblock will have no way to show up.Regardless if CompareExchange intrinsic is expanded or not, the new entry point has no byrefs in it. Emitted code is the same as before, but without GC tracking of the
location
argument.